Enrichment of transcriptional regulatory sites in non-coding genomic region

نویسندگان

  • Wen Xue
  • Jin Wang
  • Zhirong Shen
  • Huaiqiu Zhu
چکیده

MOTIVATION Over-represented k-mers in non-coding genomic regions often lead to identification of potential transcriptional regulatory sites (TRS). This phenomenon has been employed by many algorithms to predict TRS in silico. Yet, the improvement of these algorithms should be based on deeper understanding of the enrichment feature. To obtain a general distributional profile of TRS in different regions of genomes as well as in different genomes, we here performed a systematic analysis on the over-representation of TRS in intergenic regions and gene upstream regions of yeasts and viral genomes, and the distributional pattern of TRS in intergenic and intron regions of the Drosophila genome. We also explored the way to evaluate the accuracy of TRS consensus sequences by measuring their enrichment. RESULTS To measure enrichment, a statistical background model was introduced by comparing TRS frequency in certain regions of genome to either the frequency in the whole genome or the frequency in exon region. This model was applied to different classes of non-coding genomic regions in four genomes. Most of the TRS were observed to be over-represented in the intergenic regions of the Saccharomyces cerevisiae, Schizosaccharomyces pombe and Epstein-Barr virus (EBV) genomes. The enrichment of S.cerevisiae TRS in the 600 bp upstream region of genes was also significant. In Drosophila genome, TRS did not show enrichment in intergenic and intron regions when TRS frequency in the whole genome was taken as background, as we did in other genomes. However, when we took TRS frequency in exon region as background, over 70% TRS are over-represented in those two classes of non-coding regions. This fact indicates the existence of transcriptional regulatory signals in introns. The analysis of some S.cerevisiae TRS, which have inconsistent consensus sequences with different levels of enrichment in intergenic region, suggests the possibility of evaluating the accuracy of experimentally determined TRS by measuring their enrichment in non-coding genomic regions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enrichment of regulatory signals in conserved non-coding genomic sequence

MOTIVATION Whole genome shotgun sequencing strategies generate sequence data prior to the application of assembly methodologies that result in contiguous sequence. Sequence reads can be employed to indicate regions of conservation between closely related species for which only one genome has been assembled. Consequently, by using pairwise sequence alignments methods it is possible to identify n...

متن کامل

Long non-coding RNAs and their significance in human diseases

Protein-coding genes account for only a small fraction of the human genome and most of the genomic sequences are transcriptionally silent, but recent observations indicate significant functional elements, including non-coding protein transcripts in the human genome. Long non-coding RNAs (lncRNAs) have been defined as transcripts of >200 nucleotides without protein-coding capacity that perform t...

متن کامل

The genome-wide distribution of non-B DNA motifs is shaped by operon structure and suggests the transcriptional importance of non-B DNA structures in Escherichia coli

Although the right-handed double helical B-form DNA is most common under physiological conditions, DNA is dynamic and can adopt a number of alternative structures, such as the four-stranded G-quadruplex, left-handed Z-DNA, cruciform and others. Active transcription necessitates strand separation and can induce such non-canonical forms at susceptible genomic sequences. Therefore, it has been spe...

متن کامل

Molecular Study of Vascular Endothelial Growth Factor Gene in Iranian Patients after Myocardial Infarction

Background: Stimulation of collateral artery growth (arteriogenesis) and/or capillary network growth (angiogenesis) would be beneficial to the patients with myocardial infarction. To understand the central role of vascular endothelial growth factor (VEGF) in biological angiogenesis, we performed molecular analysis of the VEGF gene in patients afflicted with acute myocardial infarction (AMI). Me...

متن کامل

P-88: Comparing Epigenetic Profile of Oct4 Regulatory Region in Embryonal Carcinoma Cells under Retinoic Acid Induction

Background: Embryonal carcinoma (EC) cells derived from germ cell tumors are valuable tools for investigating differentiation and developmental biology processes in vitro. The advantage of the reproducible and rapid expansion of these cell lines provides a useful alternative to embryos for the study of mammalian cell differentiation. During early stages of cell differentiation, the rate of tran...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 20 4  شماره 

صفحات  -

تاریخ انتشار 2004